Version Control

STAT 331

A refresher on coding best practices…

mutate() vs summarise()

Better alternatives to bar plots

Bar plots are typically reserved for displaying frequencies

# A tibble: 4 × 3
  geography     mean_price_diff sd_price_diff
  <fct>                   <dbl>         <dbl>
1 San Francisco           0.719         0.334
2 San Diego               0.685         0.211
3 Sacramento              0.578         0.270
4 Los Angeles             0.528         0.188
Code
library(RColorBrewer)
library(scales)

diff_summary  |> 
  ggplot(aes(x = mean_price_diff, 
             y = geography,
             fill = geography)
         ) +
  geom_bar(stat = "identity") +
  labs(
    title = "Difference in Price between Organic and Conventional Avocados",
    y = "") +
  theme_minimal() +
  theme(legend.position = "none") +
  scale_fill_brewer(palette = "Dark2") +
  scale_x_continuous(name = "", 
                     labels = scales::label_dollar()
                     )

Read more about Cleveland Dot Plots

Code
diff_summary |> 
  arrange(desc(mean_price_diff)) |> 
  ggplot(aes(x = mean_price_diff, 
             y = geography,
             fill = geography)
         ) +
  geom_segment(aes(xend = 0,
                   yend = geography)
  ) +
  geom_point() +
  labs(title = "Difference in Price between Organic and Conventional Avocados",
       y = "") +
  theme_minimal() +
  theme(legend.position = "none") +
  scale_fill_brewer(palette = "Dark2") + 
  scale_x_continuous(name = "", 
                     labels = scales::label_dollar()
                     )

Little Bits and Bobs

What is wrong with this code?

hiphop %>% 
  distinct(subj, .keep_all = TRUE) %>%
  filter(sex == "Male" & 
         between(age, 17, 23) &
         between(city, 10000, 60000)
         ) %>%
  slice_max(bieber) %>%
  select(subj)

What is wrong with this code?

avocado %>% 
  rename(Size_Small = `4046`) %>% 
  rename(Size_Large = `4225`) %>% 
  rename(Size_XL = `4770`)

What is the difference between facet_wrap() and facet_grid()?

When should you use facet_grid()?

Version Control

Git vs GitHub


  • Language for version control
  • Developed by Linus Torvalds (Linux, Android, Chrome OS)
  • Uses command line or GUI

  • Cloud-based hosting service
  • Basic services are free
  • Advanced services are paid (Similar to RStudio)

Why GitHub?

  1. A structured way for tracking changes to files over the course of a project.

  2. Makes it easy to have multiple people working on the same files at the same time.

  3. You can host a URL of fun things (like the class text, these slides, a personal website, etc.) with GitHub pages.

Think “track-changes” or “drop-box” history, but more structured.

Git Repositories

  • Think of this as a folder-directory for a single project (like your stat-331 folder!)

  • You may have code, documentation, data, TODO lists, and more associated with a project.

  • To create a repository, you can start with your local computer first, or you can start with the remote (online) repository first.

Actions in Git

Cloning a Repo

Clone = create an exact copy locally

Committing

Git tracks changes to each file that it is told to monitor, and as the files change, you provide short labels describing what the changes were and why they exist (called “commits”).

Here, we commit the red line as a change to our file.

The log of these changes (along with the file history) is called your git commit history. This means you can always go back to old copies!

Pushing

Updates the copy of the repository on another machine (e.g. on GitHub) so that it has the most recent changes you’ve made to your machine.

Pulling

Updates your local copy of the repository (the copy on your computer) with the files that are “in the cloud” (on GitHub).

Pushing and Pulling

Merge Conflicts

Occur when you make changes to the same line as a collaborator either at the same time, or without starting from the same “state”.

  1. Maybe you are working in real time on the same line of code or text.
  2. Maybe you forgot to push your changes last time you finished working.
  3. Maybe you forgot to pull your changes before you started working this time.

Workflow

Starting a new project/local repo

  1. Clone the project or create a new repository
  2. Make some changes
  3. Commit the changes
  4. Pull any changes from the remote repository
  5. Resolve any merge conflicts
  6. Push the changes (and merged files)

Workflow

Starting a new project/local repo

  1. Clone the project or create a new repository
  2. Make some changes
  3. Commit the changes
  4. Pull any changes from the remote repository
  5. Resolve any merge conflicts
  6. Push the changes (and merged files)

Working with an existing local repo

  1. Pull the repo (especially if collaborating)
  2. Make some changes
  3. Commit the changes
  4. Pull any changes from the remote repository (again!)
  5. Resolve any merge conflicts
  6. Push the changes (and merged files)

Connect GitHub to RStudio

Rpackages we will need

Work in your terminal or an .Rscript for this…

  1. Install and load the {usethis} Rpackage
install.packages(usethis)
library(usethis)
  1. Install and load {gitcreds} RPackage
install.packages("gitcreds")
library(gitcreds)

Generate your PAT (Personal Access Token)

  1. Generate token
create_github_token()

Warning

GitHub really doesn’t like it when you do not have a PAT expiration date… but I don’t ever want to deal with it again. Make sure your expiration date is AT LEAST through the end of the quarter (60 days).

Store your PAT

  1. Copy your PAT into RStudio

  1. Enter password or token: Paste PAT
gitcreds_set()

Verify PAT

You should be good to go! Let’s verify.

git_sitrep()

Tips for avoiding merge conflicts

  • Always pull before you start working and always push after you are done working!

  • In general, if you follow the workflow for an existing local repo exactly, you only have problems if two of you are making local changes to the same line in the same file at the same time.

  • If you are working with collaborators in real time, pull, commit, and push often.

  • Git commits lines – lines of code, lines of text, etc.

    • Practice good code format and and put each sentence on its own line.

Creating your STAT 331 /531 portfolio repository

Forking a Repository

Fork the STAT331_portfolio_template repository

Create a New RStudio Project

Save Your Project

Do not save your project in the same folder as your STAT331.Rproj!!!!

Here are some options:

  1. Save on your Desktop
  2. Make an “umbrella” STAT 331 folder that has two (2) subfolders
  • Your STAT331-portfolio-template
  • Your STAT331 folder (the one you’ve been working in for five weeks)